Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable

نویسندگان

  • Peter C Austin
  • Ewout W Steyerberg
چکیده

BACKGROUND When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. METHODS An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. RESULTS Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. CONCLUSIONS The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

به‌کارگیری متغیرهای پنهان در مدل رگرسیون لجستیک برای حذف اثر هم‌خطی چندگانه در تحلیل برخی عوامل مرتبط با سرطان پستان

Background and Objectives: Logistic regression is one of the most widely used generalized linear models for analysis of the relationships between one or more explanatory variables and a categorical response. Strong correlations among explanatory variables (multicollinearity) reduce the efficiency of model to a considerable degree. In this study we used latent variables to reduce the effects of ...

متن کامل

Binary Regression With a Misclassified Response Variable in Diabetes Data

Objectives: The categorical data analysis is very important in statistics and medical sciences. When the binary response variable is misclassified, the results of fitting the model will be biased in estimating adjusted odds ratios.  The present study aimed to use a method to detect and correct misclassification error in the response variable of Type 2 Diabetes Mellitus (T2DM), applying binary ...

متن کامل

FUZZY LOGISTIC REGRESSION: A NEW POSSIBILISTIC MODEL AND ITS APPLICATION IN CLINICAL VAGUE STATUS

Logistic regression models are frequently used in clinicalresearch and particularly for modeling disease status and patientsurvival. In practice, clinical studies have several limitationsFor instance, in the study of rare diseases or due ethical considerations, we can only have small sample sizes. In addition, the lack of suitable andadvanced measuring instruments lead to non-precise observatio...

متن کامل

Logistic Regression Analysis of Some Factors Influencing Incidence of Retained Placenta in a Holstein Dairy Herd

To investigate the effects of certain factors on the rate of retained placenta, 2844 calving records from 1288 Holstein cows in a herd were used. These cows calved during year period of 2001 to 2007. A generalized statistical linear model was applied to analyze the data. Logistic regression model was applied as the statistical model. In the model, fixed effects of year, season (warm or cold) an...

متن کامل

Phase II logistic profile monitoring

In many industrial and non-industrial applications the quality of a process or product is characterized by a relationship between a response variable and one or more explanatory variables. This relationship is referred to as profile. In the past decade, profile monitoring has been extensively studied under the normal response variable, but it has paid a little attention to the profile with the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2012